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METHOD AND APPARATUS FOR VIDEO FRAME SEQUENCE- 
BASED OBJECT TRACKING 

BACKGROUND OF THE INVENTION 
5 . RELATED APPLICATIONS 

The present invention relates and claims priority from US 
provisional patent application serial number 60/354,209 titled ALARM 
SYSTEM BASED ON VIDEO ANALYSIS, filed 6 February ' 2002. The 
present invention also claims priority from and is related to PCT application 
10 serial number PCT/IL02/01042 titled SYSTEM AND METHOD FOR VIDEO 
CONTENT-ANALYSIS-BASED DETECTION, SURVELLANCE, AND 
ALARM MANAGEMENT, filed 24 December 2002. 

FIELD OF THE INVENTION 
The present invention relates to video surveillance systems in 
15 general, and more particularly to video frame sequence-based objects tracking 
in video surveillance environments. 

DISCUSSION OF THE RELATED ART 
Existing video surveillance systems are based on diverse automatic 
object tracking methods. Object tracking methods are designed to process a 
20 captured sequence of temporally consecutive images in order to detect and 
track objects that do not belong to the "natural" scene being monitored. Current 
object tracking methods are typically performed by me separation of the objects 
. from the background, (by delineating or segmenting the objects), and via the 
determination of the motion vectors of the objects across the sequence of 
25 frames in accordance with the spatial transformations of the tracked objects. 
The drawbacks of the current methods concern the inability to track static 
objects for a lengthy period of time. Thus, following a short interval, during 
which a previously dynamic object ceased moving,- the tracking of the same 
object is effectively rendered. An additional drawback of the current, methods 
30 concerns the inability of me methods to handle "occlusion" situations, such as 
where the tracked objects are occluded (partially or entirely) by other objects 
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. temporarily passing through or permanently located between the image 
acquiring devices and the tracked object. . 

There is a need for an advanced and enhanced surveillance, object 
tracking and identification system. Such a system would preferably automate 
5 the procedure concerning the identification of an unattended object Such a 
system would further utilize an advanced object tracking method that would 
provide the option of tracking a non-moving object for an operationally 
effective period and would continue tracking objects in an efficient manner 
even where the tracked object is occluded. 
10 SUMMARY OF THE PRESENT INVENTION 

One aspect of the present invention regards an apparatus, for the analysis 
of a sequence of captured images covering a scene for detecting and tracking of 
moving and static objects and for matching the patterns of object behavior in 
the captured images to object behavior in predetermined scenarios. The 
15 apparatus comprises at least one image sequence source for transmitting a 
sequence of images to an object tracking program, and an object tracking 
• program. The object tracking program comprises a. pre-processing application 
layer for constructing a difference image between a currently captured video 
frame and a previously constructed reference image, an objects clustering 
20 application layer for generating at least one new or updated object from the 
difference image and an at least one existing object, and a background updating 
application layer for updating at least one reference image prior to processing 
of a new frame. 

A second aspect of the present invention regards a method for the analysis 
25 of a sequence of captured images showing a scene for detecting and tracking of 
- at least one moving or static object and for matching the patterns of tire at least 
- one object behavior in the captured images to object behavior, in predetermined 
scenarios. The method comprises capturing at least one image of the scene, pre- 
processing the captured at least one image and generating a short term 
30 difference image and a long term difference image, clustering the at least one 
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moving or static object in the short term difference and long term difference . 
images, and generating at least one new object and at least one existing object. 
BRIEF DESCRIPTION OF THE DRAWINGS 
The present invention will be understood and appreciated more fully 

5 from the following detailed description taken in conjunction with the drawings 

in which: 

Fig. 1 is a schematic block diagram of the system architecture, in 
accordance with a preferred embodiment of the present invention; 

Fig. 2 is a high-level block diagram showing the application layers 
10 of the object tracking apparatus, in accordance with the preferred embodiment 
of the present invention; 

Fig. 3 is a block diagram illustrating the components of the 
configuration layer, in accordance with the preferred embodiment of the 
present invention; 

15 . . Fig. 4A is a block diagram iilustrating the components of the pre- 

processing layer, in accordance with the preferred embodiment of the present 
invention; 

Fig. 4B is a block diagram illustrating the components of the 
clustering layer, in accordance with the preferred embodiment of the present 
20 invention; 

Fig. 5A is a block diagram iUustrating the components of the scene 
characterization layer, in accordance with the preferred embodiment of the 

present invention; 

Fig. 5B is a block diagram iUustrating the components of the 
25 background update layer, in accordance with the preferred embodiment of the 

present invention;' 

Fig. 6 is a block diagram showing the data structures associated with 
the object tracking apparatus, in accordance with a preferred embodiment of 

the present invention; 
30 F ig. 7 illustrates the operation of the object tracking method, in 

accordance with the preferred embodiment of the present invention; 
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Fig. 8 described the operation of the reference- image learning 
routine, in accordance with a preferred embodiment of the present invention; 

Fig. 9 shows the input and output data structures associated with the 
pre-processing layer, in accordance with a preferred embodiment of the present 
5 invention; 

Figs. 10A, 1 OB and IOC describe the operational steps associated 
• • with the clustering layer, in accordance with the preferred embodiment of the 

present invention; 

Fig. 1 1 illustrates the scene characterization, in accordance with the 

10 preferred embodiment of the present invention; 

Fig. 12 illustrates fee background updating, in accordance with the 

preferred embodiment of the present invention. 
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DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 

An object tracking apparatus and method for the detection and 
tracking of dynamic and static objects is disclosed. The apparatus and method 
may be utilized in a monitoring and surveillance system. The surveillance 
system is operative in the detection of potential alarm situation via a recorded 
surveillance content analysis and in Ihe management of the detected unattended 
20 object situation via an alarm distribution mechanism. The object tracking 
apparatus supports me object tracking method that incorporates a unique 
method for detecting, tracking and counting objects across a sequence of 
captured surveillance content images. Through the operation of the object 
: tracking method the captured content is analyzed and the results of the analysis 
25 provide the option of activating in real time a set of alarm messages to a set of 
diverse devices via a triggering mechanism. In order to provide the context in 
which the object tracking apparatus method is useful, several exemplary 
" associated applications will be briefly described. The method of the present 
. invention may be implemented in various contexts such as the detection of 
unattended objects (luggage, vehicles or persons), identification of vehicles 
parking or driving in restricted zones, access control of persons into restricted 
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zones, prevention of loss of objects (luggage or persons) and counting of - 
persons, as well as in police and fire alarm situations. In likewise manner the 
object tracking apparatus and method described here in. may be useful in 
myriad of other situations and as a video objects analysis tool. 
5 In the preferred embodiments of the present invention, the 

monitored content is a stream of video images recorded by video cameras, 
captured, and sampled by a video capture device and transferred to a video 
processing unit. Each part of this system may be located in a single device or in 
separate devices located in various locations and inter-connected by hardwire 
10 or via wireless connection over local or wide or other networks. The video 
processing unit performs a content analysis of the video frames where the 
content analysis is based on the object tracking method. The results of the 
analysis could indicate an alarm situation. In other preferred embodiments of 
the invention,' diverse other content formats are also analyzed, such as thermal 
15 based sensor cameras, audio, wireless linked cameras, data produced from 

motion detectors, and the like. 

An exemplary application that could utilize the apparatus and 
method of the present invention concerns the detection of unattended objects, 
such as luggage in a dynamic object-rich environment, such as an airport or 
20 city center. Oiher exemplary applications concern the detection of a vehicle 
parked in a forbidden zone, or the extended-period presence of a non-moving 
vehicle in a restricted-period parking zone. Forbidden or restricted parking 
zones are typically associated with sensitive traffic-intensive locations, such as 
a city center. Still applications that could use the apparatus and melhod include 
25 the tracking of objects such as persons involved in various scenario models, 
such as a person leaving the vehicle away from the terminal, which may equal 
suspicious (unpredicted) behavioral pattern. In other possible applications of 
the apparatus and method of the present invention can be implemented to assist 
in locating lost luggage and to restrict access of persons or vehicles to certain 
' 30 zones. Yet other applications could regard the detectionof diverse other objects 
in diverse other environments. The following description is not meant to be 
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limiting and the scope of the invention is defined only by the attached claims. 
Several such applications are described in detail in. related PCT patent 
application serial number PCT/1L02/01042 titled SYSTEM AND METHOD 
FOR VIDEO CONTENT-ANALYSIS-BASED DETECTION, 
5 SURVEILLANCE, AND ALARM MANAGEMENT, filed 24 December 
2002, the content of which is incorporated herein by reference. - 

The method and apparatus of the present invention is operative in 
the analysis of a sequence of video images received from a video camera 
covering a predefined area, referred herein below to as the video scene. In one 
10 example it may be assumed that the object monitored is a combined object 
comprising an. individual and a suitcase where the individual carries the 
suitcase. The combined object may be separated into a first separate object and 
a second separate object. It is assumed that the individual (second object) 
leaves the suitcase (first object) on the floor, a bench, or the like. The first 
15 object remains in the video scene without movement for a pre-defined period of 
time. It is assumed that the suitcase (first object) was left unattended. The 
second object exits the video scene. It is assumed that the individual (second 
object) left the video scene without the suitcase (first object) and is now about 
leave the wider area around the video scene. Following the identification of the 
20 previous sub-events, referred to collectively as the video scene characteristics, 
the event will be identified by the system as a situation in which an unattended 
suitcase was left in the security-sensitive area. Thus, the unattended suitcase 
• will be considered as a suspicious object. Consequendy, the system of the 
present invention generates, displays, and or distributes an alarm indication. 
25 Likewise, in an alternative embodiment a first object, such as a suitcase or 
person monitored is already present and monitored within the video scene. 
Such object can be lost luggage located within the airport. Such object can be a 
person monitored. The object may merge into a second object. The second 
object can be a person picking up the luggage, another person to whom the first 
30 person joins or a vehicle to which the first person enters. The first object (now 
merged with the second object) may move from its original position and exist 
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the scene or move in a prohibited direction so predetermined. The application 
.will provide an indication to a human operator. The indication may be oral, 
■ visual or written. The indication may be provided visually to a screen or 
delivered via communication networks to officers located at the scene or to off- 
premises or via dry contact to an external device such as a siren, a bell, a 
flashing or revolving light and the like. An additional exemplary application 
that could utilize the apparatus and method of the present invention regards a 
detection of vehicles parked in restricted area or moving in restricted lanes. 
Airports, government buildings, hotels and other institutions typically forbid 
vehicles from parking in specific areas or driving in restricted lanes. In some 
areas parking is forbidden all the time while in other areas parking is allowed 
for a short period, such as several minutes. The second exemplary application 
is designed to detect vehicles parking in restricted areas for more than a pre- 
defined number of time units and generates an alarm when identifying an 
illegal parking event of a specific vehicle. In another preferred embodiment the 
system and method of the present invention can detect whether persons 
disembark or embark a vehicle in predefined restricted zones. Other exemplary 
applications can include the monitoring of persons and objects in city centers, 
warehouses, restricted areas, borders or checkpoints and the like.- 

It would be easily perceived that for the successful operation of the 
above-described applications an object tracking apparatus and an object 
tracking method are required. The object tracking method should be capable of 
detecting moving objects, tracking moving objects and tracking static objects, 
such as objects that are identified as moving and subsequently identified as 
non-moving during a lengthy period of time. In order to match me patterns of 
. object behavior in the captured image sequences to the patterns of object 
behavior in above-described scenarios, the object tracking method should 
. recognize linked or physically connected objects, to be able to recognize the 
separation of the linked objects, to track the separated objects while retaining 
30 the historical connectivity states of the objects. The object tracking apparatus 
and metiiod should further be able to handle occlusions where the tracked 



25 



7 



PCT/IL03/00097 

WO 03/067884 

objects are occluded by one or more separate objects temporarily, semi- 
permanently or permanently. 

Referring to Fig. 1 the image sequence sources 12 are one or more 
video cameras operating in a security-wise sensitive environment and cover a 
specific pre-defined visual area that is required to .be monitored. The area 
monitored can be any area preferably in a transportation area including an 
airport, a city center, a building, and restricted or non-restricted areas withm 
buildings or outdoors. The image sequence sources 12 could include analog 
devices and/or digital devices. The images provided by the image sequence 
, sources could include normal light, infrared, temperature, or any other form of 
radiation. The image sequence sources 12 continuously acquire and transmit 
sequences of video images and provide the images simultaneously to an image 
sequence display device 20 and to a computing and storage device 15. The 
display device 20 could be a video terminal, which is operated by a human 
5 operator or any other display device including a display device located on a 
mobile or hand held device. Alarm triggers are generated by the object tracking 
program 14 installed in the computing and storage device 15 in order to 
indicate an alarm situation to the operator of the display device 20. The alarm 
may be generated in the form of an audio or any olher indication. The unage 
>0 sequence sources 12 transmit sequences of video images to an object tracking 
program 14 via suitably wired connections. The images could be provided 
through an analog interface, a digital interface or through a Local Area 
Network (LAN) interface or Wide Area Network (WAN), IP, Wireless, 
Satellite connectivity. The computing and storage device 15 could be an 
25 external computing platform, such as a personal computer (PC), a UNIX 
workstation or a mainframe computer having appropriate processing and 
storage unite or a dedicated hardware such as a DSP based platform. It is 
contemplated that future hand held devices will be powerful enough to also 
implement device 15 there within. The device 15 could be also an array of 
30 integrated circuits with built-in digital signal processing (DSP) and storage 
capabilities coupled directly to the image sequence sources 12. The device 15 
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includes a set of object tracking routines constituting the object tracking- 
program 14 and a set of object tracking control data strucnires 16. The object 
tracking , program 14 in association with the object tracking control data 
structures 16 receives the image sequence from the image sequence sources 12, 
5 and processes the image sequence in order , to detect and to track objects 
therein. Consequent to the detection of pre-defined spatio-temporal patterns of 
behavior associated with the tracked objects across the image sequences 
appropriate alarm triggers are generated and transmitted to the display device 
20. 

10 Still referring to Fig. 1 the object tracking program 14 and the 

associated control data structures 16 could be installed in distinct platforms 
and/or devices distributed randomly across a Local Area Network (LAN) that 
• could communicate over the LAN infrastructure or across Wide Area Networks 
(WAN). One example is a Radio Frequency Camera that transmits composite 
15 video remotely to a receiving station, the receiving station can be connected to 
other components of the system via a network or directly. The program 14 and 
the associated control data structures 16 could be installed in distinct platforms 
' and/or devices distributed randomly across very wide area networks such as the 
Internet Various forms of communication between the constituent parts of the 
20 system can be used. Such can be a data communication network, which can be 
connected via landlines or wireless or like communication devices and that can 
be implemented via TCP/IP protocols and like protocols. Other protocols and 
methods of communications, such as cellular, satellite, low band, and high band 
communications networks and devices will readily be useful in me 
25 implementation of the present invention. The program 14 and the associated 
control data structures 16 could be further co-located on the same computing 
platform or distributed across several platforms for load balancing, redundancy 
considerations, back-up in the case of equipment failure, and the like. Although 
on the drawing under discussion only a single image sequence source and a 
30 single computing and storage device is shown it will be readily perceived that 
in a realistic environment a plurality of image sequence sources could be 
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connected -to a plurality of computing and storage devices. Moreover, two 
image sequence sources each capturing a slightly different scene may provide a 
stereo image sequence source. Likewise, a multiplexed image sequence source 
from a plurality of image capturing devices may be used. The object tracking 
5 apparatus comprises an object tracking program and associated object tracking 

control data structures. 

Referring now to Fig/ 2 which is a high-level block diagram 
showing the application layers of the object tracking apparatus of the present 
invention. The object tracking program 14 includes several application layers. 
10 Each application layer is a group of logically and functionally linked computer 
program components responsible for different aspects of the application within 
, the apparatus of the present invention. The object tracking program 14 includes 
a configuration layer 38 ? a pre-processing layer 42, and an objects clustering 
layer 44, a scene characterization layer 46, and a background updating layer 48. 
15 Each layer is a computer program executing within the computerized 
environment shown in detail in association with the description of Fig. 1. The 
configuration layer 38 is a responsible for the initialization of the apparatus of 
the present invention in accordance with specific user-defined parameters. The 
. pre-processing layer 42 is operative in constructing difference images between 
20 a currently captured video frame and previously constructed reference images. 
The objective of the objects clustering layer 44 is to generate new and or 
updated objects from the difference images and the existing objects. The scene 
characterization layer 46 uses the objects generated by the objects clustering 
layer 44 to describe the monitored scene. The layer 46 also includes- a 
25 triggering mechanism that compares the behavior pattern and other 
characteristics of the objects to pre-defined behavior patterns and 
characteristics in order to create alarm triggers. The background updating layer 
48 updates the reference images for the processing of the next frame. A more 
detailed description of the structure and functionality of the application layers 
30 will be provided herein under in association with the following drawings. 
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' Referring to Fig. 3 shows, a block diagram mustering the 
components of the configuration layer. The configuration layer 38 comprises a 
reference image constructor component 50, a timing parameters definer 
component 52, and a visual parameters definer component 54. The reference 
5 image constructor component 50 is responsible for the acquisition of the 
baclcgronnd model. The reference image is generated in accordance wim ^pre- 
defined option. The component 50 includes a current frame capture module 56, 
a reference image loading module 60, and a reference image learning module 
62. In accordance with the pre-selected option the reference image may be 
10 created alternatively from; a) a currently captured frame, b) an existing 
reference image, c) a reference image learning module. The current frame 
capture module 56 provides a currently captured frame to be used as the 
reference image. The currently captured frame can be a frame from any camera 
covering the scene. The reference image loading module 60 provides the option 
15 for loading an existing reference image located on file locally or remotely. The 
user may select the appropriate image from the file and designate it as the 
reference image. The reference image learning module 62 provides the option 
• that the reference image is generated adaptively . learned from a consecutive 
sequence of captured images. The timing parameters definer component 52 
20 provides time settings information, such as the number of time units to be 
elapsed before the generation of a trigger on a static object, and the like. The 
visual parameters definer component 54 provides the option to the user to 
define the geometry of the monitored scene. The component 54 includes, a 
camera tilt setting module 64, a camera zoom setting module 65, a region 
25 location definition module 66, a region type definition module 67, and an alarm 
type definition module 68. The module 64 derives the camera tilt in 
accordance with the measurements taken by a user of an arbitrary object 
' located at different location in the monitored scene. The module 65 defines the 
maximum, the minimum and the typical, size of the objects to be tracked. The 
30 region location definition module 66 provides the definition of me location of 
one or more regions-of-interest in the scene. The region type definition module 



11 



PCT/IL03/00097 

WO 03/067884 

67 enables the user to define a region of interest as "objects track region" or "no 
objects track region". The alarm type definition module 68 defines a region of 
. interest as "trigger alarm in region" or "no alarm trigger in region", in 
accordance with the definitions of the user. 
5 Referring now to Fig. 4A showing a block diagram iUustrating the 

' components of the pre-processing layer, in accordance with the preferred 
embodiment of the present invention. The pre-processing layer 42 comprises a 
current frame handler 212, a short-term reference image handler 214, a long- 
term reference image handler 216, a pre-processor module, a short-term 
10 difference image updater 220, and a long-term difference image updater 222. 
Each module is a computer program operative to perform one or more tasks in 
association with the computerized system of Fig. 1. The current frame handler 
212 obtains a currently captured frame and passes the frame to the pre- 
processor module 218. The short-term reference handler 214 loads an existing 
15 ' short-term reference image and passes the frame to the pre-processor module 
218. The handler 214 could further provide calculations concerning the 
moments of the short term reference image. The long-term reference handler 
216 loads an existing long-term reference image and passes the frame to the 
pre-processor module 218. The handler 216 could further provide calculations 
20 concerning the moments of the long term reference image. 

The pre-processor module 218 uses the current frame and the 
obtained reference images as input for processing. The process generates a new 
short-term difference image and a new .long-term difference image and 
subsequently passes the new difference images to the short-term reference 
25 image updater (handler) 220 and 1he long-term difference image updater 
(handler) 222 respectively. Using the hew difference images the updater 220 
and the updater 222 update the existing short-term reference image and the 
existing long-term reference image respectively. 

Referring now to Fig. 4B showing a block diagram illustrating the 
30 components of the clustering layer, in accordance with the preferred 
embodiment of the present invention. The clustering layer 44 comprises an 
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object merger module 231, an objects group builder module 232, an objects 
group adjuster module -234, a new objects creator module 236, an object 
searcher module 240, a Kalman filter module 242, and an object status updater 
254 Each module is a computer program operative to perform one or more 
5 tasks in association with the computerized system of Fig. 1. The object merger 
module 231 corrects clustering errors by the successive merging of partially 
overlapping objects having the same motion vector for a pre-defined period. 
The objects group builder 232 is responsible- for creating groups of close 
objects by using neighborhood relations among the objects. The object group 
10 adjuster 234 initiates a group adjustment processes in order to find the optimal 
spatial parameters of each object in a group. The new objects constructor 
module 236 constructs new objects from the difference images, controls the 
operation of a specific object location and size finder function and adjusts new 
objects. The new objects may be construed from the difference images whether 
15 existing objects are compared with or where there are no existing objects. For 
example, when the system begins operation a new object may be identified 
even if there are no previously acquired and existing objects. The object 
searcher 240- scans a discarded objects archive in order to attempt to locate 
recently discarded objects with parameters (such as spatial parameters) similar 
20 to a newly created object. 

• In order to improve accuracy of the tracking and in order to reduce 
the computing load a Kalman filter module 242 is utilized to track the motion 
of the objects. The object status updater 254 is responsible for modifying the 
status of the object from "static" to "dynamic" or from "dynamic" to "static". A 
25 detailed description of the clustering layer 44 will be set forth herein under in 
association with the following drawings. 

Referring now to Fig. 5A showing a block diagram illustrating the 
components of the scene characterization layer, in accordance with the 
preferred embodiment of the present invention. The scene characterization 
30 layer 46 comprises an object movement measurement module 242, an object 
merger module 244, and a triggering mechanism 246. The object movement 
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measurement module 242 analyzes the changes in the spatial parameters -of an 
object and determines whether the object is moving or stationary. The object 
merger module 244 is responsible for correcting errors to objects as a result of 
the clustering stage. The functionality of fhe triggering mechanism 246 is to 
5 check each object against the spatio-temporal behavior, patterns and properties 
defined as "suspicious" or as alarm triggering. When a suitable match is found 
the mechanism 246 generates an alarm trigger. The operation of the scene 
characterization layer 46 will be described herein under in association with the 
following drawings. 

10 Referring now to Fig. 5B showing a block diagram illustrating the 

components of the background update layer, in accordance with the preferred 
embodiment of the present invention. The background updating layer 48 
comprises a background draft updater 248, a short-term reference image 
npdater 250, and a long-term reference image updater 252. The functionality of 
15 the updater 248 is to update continuously the background or reference "draft- 
frame from the current frame: The short-term reference image updater 250 and 
• the long-term reference image updater 252 maintain the short-term reference 
image and the long-term reference image, respectively. A detailed description 
of the operation of the background-updating layer 48 will be provided herein 
20 under in association with following drawings. 

Referring now to Fig. 6 showing a block diagram of the data 
structures associated with the object tracking apparatus, in accordance with a 
preferred embodiment of the present invention. The object tracking control 
structures 16 of Fig. 1 comprise a long-term reference image 70, a short-term 
25 reference image 72, an objects table 74, a sophisticated absolute distance 
(SAD) short-term map 76, a sophisticated absolute distance (SAD) long-term 
map 78, a discarded objects archive 82, and a background draft 84. The long- 
term reference image 70 includes the background image of the monitored scene 
without the dynamic and without the static objects tracked by the apparatus and 
. 30 method of the present invention. The short-term reference image 72 includes 
the scene background image and the static objects tracked by the object 
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tracking method. The objects table includes a list of dynamic and static objects 
with associated object- data and object meta data. The object data includes 
object identification, objects status, and various control fields, such as a non- 
moving counter, non-moving-time counter, and the like. The meta data 
5 comprises information concerning the current spatial parameters, the properties 
and the motion vector data of the objects acquired from the previously 
performed processing on a succession of previous frames. The short-term and 
long-term sophisticated difference maps (SADs) 76, 78 represent the difference 
between a currently captured frame and the short-term and long-term reference 
10 images 78, 80. The discarded object archive 82 stores discarded objects for 
object history. The background draft 84 (also referred to as the reference 
image, but not the short-term or long-term reference images) is a constantly 
changing image of the monitored scene. Each pixel within each current frame is 
taken into consideration when calculating the background draft 84. The draft 84 
15 is used for inserting "static" objects to the short-term reference image 72. The 
background draft 84 constantly reviews the scene background. If an object 
enters the monitored scene, such object is inserted into the background draft 84. 
When the method determines that the object is a "static" object (after the object 
was perceived as stationary across a pre-defined number of captured frames). 
20 the pixels of the object are copied from the background draft 84 to the short- 
term reference image 72. 

. Referring now to Fig. 7, the object tracking module operates by 
detecting objects across a temporally ordered sequence of consecutively 
captured images where the objects do not belong to the "natural" or "static" 
25 monitored scene. The object tracking module operates through the use of a 
central processing unit (not shown) utilizing data structures (not shown). The 
data structures are maintained on one or more memory or storage devices 
installed across a hardware environment supporting the application. Fig. 7 
illustrates the various steps in the operation of the object tracking method. The 
30- configuration step (not shown) is performed prior to the beginrang of the 
tracking (steps 88 through 94). In the configuration step the object-tracking 
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module is provided with reference images, with timing parameters and with 
visual parameters, such as regions-of-interest defmitions. The provided 
information enables me method to decide which regions of the frame to work 
on and' in which regions should an alert situation be produced. The 
5 configuration step optionally includes a reference image learning step (not 
shown) in which the background image is adaptively learned in order to 
construct a long-term and a short-term reference image from a temporally 
consecutive sequence of captured images. When no stationary objects were 
detected in the last frames the long-term reference image is copied, and 
10 maintained as. a short-term reference picture. The long-term reference image 
contains no objects while the short-term reference image includes static objects, 
such as objects that have been static for a pre-defined, period. In the preferred 
embodiment of me invention, the length of the pre-defined period is one minute 
while in other preferred embodiments other time values could be used. The 
15 long-term reference image and the short-term reference image are updated for 
background changes, such as changes in the mumination artifacts associated 
with the image (lights or shadows or constantly moving objects (such as trees) 
and the like). The video frame pre-processing phase 88 uses a currently 
captured frame and the short-term and long-term reference images for 
20 generating new short-term and long-term difference images. The difference 
images represent the difference between the currently captured frame and the 
reference images. The reference images can be obtained from one of the image 
sequence sources described in association with Fig. 1 or could be provided 
directly by a user or by another system associated with the system of the 
25 present invention. The difference images are suitably filtered or smoothened. 
The clustering phase 90 generates new or updated objects from the difference 
images and from the previously generated or updated objects. The scene 
characterization phase 92 uses the objects received from the clustering phase 90 
" in order to describe the scene. The background updating step 94 updates the 
30 short-term and iong-term reference images for the next frame calculation. Note 
should be taken that in other preferred embodiments of the invention other 
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similar or different processes could be used to accomplish the underlying 
objectives of the method of the present invention 

Note should be taken that proposed apparatus and method is 
provided the capability of functioning in specific situations where an image 
5 acquiring device, such as a video camera, is not static. Examples for such 
situations include a pole-mounted outdoor, camera operating in windy 
conditions or mobile a camera physically tacking a moving object. For such • 
situations the object tracking method requires a pre-pre-processing phase 
configured such as to compensate for the potential camera movements between 
10 the capture of the reference images and the capture of each current frame. The 
pre-pre-processing phase involves an estimation of the relative overall frame • 
movement (registration) between the current frame and the reference images. 
Consequent to the estimation of the registration (in terms of pixel offset) the 
' offset is applied to the reference images in order to extract "in-place» reference 
15 images for the object tracking to proceed in a usual manner. As- a result, 
extended reference images have to be used, allowing for margins (the content 
of which may be constantly updated) up to the maximal expected registration 

The estimation of the registration (offset) between the current frame 
and the reference images involves a separate estimation of the x and y offset 
20 components, and a joint estimation of the x and y offset components. For the 
separate estimation, selected horizontal and vertical stripes of the current frame 
and the reference images are averaged with appropriate weighting, and cross- . 
correlated in search of a maximum match in the x and y offsets, respectively. 
For the joint estimation, diagonal stripes are used (in both diagonal directions), 
25 from which the x and y offsets are jointly estimated. The resulting estimates are 
then averaged to produce the final estimate. 

Referring now to Fig. 8 which describers the operation of the 
reference image learning routine, in accordance with a preferred embodiment 
of the present invention The construction of the long-term and short term 
30 reference images could be carried out in several alternative ways. A currently 
captured frame could be stored on a memory device as the long-term reference 
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image. Alternatively,' a previously stored long-term reference, image could be 
loaded from fee memory device in order to be used as the current long-term 
reference image respectively. Alternatively, a specific reference image learning 
process could be activated (across steps 100 through 114). In step 100 the 
" 5 reference image learning process is performed across a temporally consecutive 
sequence of captured images where each of the frames is divided into macro 
blocks (MB) having a pre-defined size, such as 16X16 pixels or 32X32 pixels 
or any like other division into macro blocks. Next at step 102 each MB is 
examined for motion vectors. The motion is detected by comparing the MB in a 
10 specific position in currently captured frame to the MB in the same position in 
the previously captured frame. The comparison is performed during the. 
encoding step by using similar information generated therein for video data 
compression purposes.- According to the result of the examination each MB is 
marked as being in one of the following three states; a) Motion MB 108 where 
15 a motion vector is detected in the current MB relative to the parallel MB in the 
previously captured frame, b) Undefined MB 104 where no motion vector is 
detected in the MB relative to the parallel MB in the previously captured frame 
but motion vector was detected across a previously captured set of temporally 
consecutive frames where the sequence rs defined as having a pre-defined 
20 number of frames. In the preferred embodiment of the invention the number of 
frames in the sequence is about 150 frames while in other preferred 
embodiments of the invention different values.could be used, c) Background 
MB 106 where no motion vector was detected across the previously captured 
sequence of temporally consecutive frames. In step 110 the values of each of 
25 the pixels in an MB that were identified as a Background MB are obtained and 
in step 1 12 the values are averaged in time 1 12. In step 1 14 an initial short term 
and long term reference image is generated from the values average in time. In 
order to avoid undetermined values for pixels in the MBs that were always in 
motion, such as an MB . wherein there was a constant motion (trees moving in 
30 wind), in step 114 the short-term reference image is created such that it 
contains the averages of the values of pixels in time. Subsequently, the pixels 
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are examined in order to find which pixels had insufficient background time ; 
(MBs that were always in motion). Pixels without sufficient background time 
are given the value from the short-term reference image. 

Referring now to Fig. 9 showing the input and output data structures 
5 ' associated with the pre-processing' layer, in accordance with a preferred 
embodiment of the present invention. The pre-processing step 88 of Fig. 6 
employs the current frame 264 and the short-term reference image 262 to 
generate a short-term difference image 270. The step 88 further uses the current 
frame 264 and the long-term reference image 266 to generate a long-term 
10 difference image 272. The long-term 272 and short-term 270 difference images 
represent respectively the sophisticated absolute difference (SAD) between the 
. current frame 264 and the long-term 266 and me short-term 262 reference 
images. The size of the difference images (referred to herein after as SAD 
maps) 270, 272 is equal to the size of the current frame 264. Each pixel in the 
15 SAD maps 270, 272 are provided with an arbitrary value in the range of 1 
through 101. Other values may be used instead. High values indicate a 
' substantial difference between the value of the pixel in the reference images 
262, 266 and the value of the pixel in the currently captured frame 264. Thus, 
the score indicates the probability for the pixel belonging either to the scene 
20 background or to an object. The generation of the SAD maps 270, 272 is 
achieved by performing one of two alternative methods. 

Still referring to Fig. 9, in the first pre-processing method for each 
specific pixel in the currently captured frame 264 the absolute difference 
between the specific pixel and the matching pixel in the reference images 262, 
25 - 266 is calculated where the calculation takes into account the average pixel 
value: 

(1): D(x, y) = a0 x Ymin(x, y) +al x Ymax(x, y) + a3 

In the above equation the values of x, y concern the pixel 
30 coordinates. The values of Yrnin and of Ymax represent the lower and the 
higher luminance levels at (x, y) between the current frame 264 and the 
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• reference images 262, 266. The values of aO, al, and a3 are thresbolds-designed 
■ to rninimize D(x, y) for similar pixels and maximize it for non-similar pixels. 
Consequent to the performance of the above equation for each of the pixels and 
to the generation of the SAD maps 270, 272 the SAD maps 270, 272 are 
5 filtered for smoothing with two Gaussian filters one in the X coordinate and the 

second in the Y coordinate. 

In the second alternative pre-processing method, around each pixel 
P(x, y) the following values are calculated where the calculation uses a 5X5 
pixels neighboring window for filtering. This step could be referred to as 
10 calculating the moments of each pixel. 
. (2): • 

i+2 J'+2 

M00(x,j)= 2 

i=x-2 j=y-2 

jc+2 j=y+2 

32* E I>-o.p&./) 

. M10(*,j) = l=X ~MOo\ X> y) 

jc+2 y+2 

15. . M01(x,j)= M0 0(x,y) 

The results of the equations represent the following values: a) MOO 
is the sum of all the pixels around the given pixel, b) M10 is the sum of all Ihe 
pixels around the given pixel each multiplied by a filter mat detects horizontal 
edges, and c) M01 is the sum of all the pixels around a given pixel multiplied 

20 by a filter that detects vertical edges. Next, the absolute difference between 
these three values in the current frame 264 and the reference images 270, 272 is 
performed. In addition the minimum of MOOCurr and MOORef are calculated 

(3):. 

D00(x,y) = \MOOcurr(x,y)-MO0ref(x,y)\ 
D10(x,y) = |M10cw7r(x, y) - M\0ref{x,y)\ 
D01(x,y) = \M0\cw-r(x,y) - MQ\ref(x, y)\ 
25 Min{x, y) = min(Jtf00cu/T(x, y), MOOref (x, y)) 
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Next the following equations are.used to construct the desired SAD 
maps 270,272: 

(4): ' t ' 

I>npl(*, = AO * (D00(*, y) + FFO) - Min(x, y) 

Tmp2{x,y) = Al * D10(x, y) + Wl 
TmpXx, y) = Al* D0l{x, y) + Wl 
AO = 15, 
.41 = 25 
RTO = —40 
PTl = -44 



(5): 



2>«pl(x,y) = min(32,rmpl(x,y)) 
rmpl(x, y) = max(-32,rmpl(x, y)) 
Tmp2{x,y) = rmn(32,Tmp2(x,y)) 
Tmp2(x, y) = max(-32, Tmp2{x, y)) 
rmp3(x,y) = minC32,rwp3(x,y)) 
Tmp3(x,y) = max(-32,rmp3(x, y)) 



(6):. 



10 



15 



3 * (TwpKx, y) + Tmp2(x, y) + T mp3(x, y) + 32) 
TmpSADMap{x,y) = — 64 

Through a convolution calculation the grade for each pixel is 
calculated while taking into consideration the values for the pixels neighbors: 
(7): 

. SADMa P {x,y) = l+ £ ^TmpSADMap(iJ) 

i=x-2j=y-2 

SADMap(.x,y) = iom{SADMap(x,y)AOY) " 

The method takes into consideration the texture of the current frame 
264 and the reference images 262, 266 and compares there between. The 
second pre-processing method is favorable since it is less sensitive to light 

changes . 

~ At the price of increased computational cost, in order to achreve a 

m ore accurate model optionally higher moments could be calculated. 
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Calculating higher moments involves the performance of the following set of 

equations: 

(8): 

• • x+2 y+2' 

M20(x >y )= ^=^ Mo5 



x+2 y+2 



5 It wUl be easily perceived that the method conld be broadened for 

even higher moments. Several equations of the second pre-processing method 
represent a simulation of a neural network. 

The pre-processing step produces several outputs that are used by -the 
clustering step. Such outputs are me short-term and long-term SAD maps. 
,0 Each pixel in the SAD maps is assigned a value in me range of 1 through 100. 
High values indicate great difference between the value of the pixel in the 
reference images and the value of the pixel in the current frame. The purpose of 
the clustering is to cluster the high difference pixels into objects. Referring now 
to Fig 10A the clustering step 120 includes a two-stage Kalman filtermg, two 
15 major processing sections, and an object status updating. In order to tmprove 
accuracy of the tracking and in order to reduce the computing load a Kalman 
filter is used to track the motion of me objects. The Kalman filter is performed 
in two steps. The prediction step 1 20 is performed before the adjustment of the 
objects and the update step 125 is performed after the creation of a new object 
20 The Kalman state of the object is updated in accordance with the adjusted 
parameters of the object At step 204 the states of the object is updated. The 
changing of the object status from "dynamic" stems to "static" status rs 
performed as follows: If the value of the non-moving counter associated wrth 
the object exceeds a specific threshold then the status of the object is set to 
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••static" The dead-area (described in the clustering step) is calculated and 
saved The pixels that are bounded within the object are copied from the 
' background draft to the short-ternr reference image. Subsequently, the status of 
...the object is set the -static". Static objects are not adjusted until then status ts 

5 changed back to "dynamic". 

Still referring to Fig. 10A in the processing step 122, in order to 
perform tracking of the objects that were detected the previous video frames, 
the parameters of the existing objects are adjusted. In the processing step 124 
uew objects are created from aU the high value pixels that do no belong to the 
10 already created objects. The adjustment of the object parameters is done for 
every group of objects. The' objects are divided into groups in accordance to 
their location: Objects of a group are close to each other and might occlude 
each other. Objects from different groups are distant from each other. The 
adjustment of groups of objects provides for the appropriate handhng of 

15 occlusion situations. 

Referring now to Eig. 10B at step 126 the objects groups are built 
An object-specific bounding ellipse represents each object The functionality, 
'structure and operation of the ellipse wiU be described hereto after m 
association with the Mowing drawings. Every two objects -are identified as 
20 neighbors if the minimum distance between their hounding ellipses ts up to 
about 4 pixels. Using the neighborhood relations between every two objects, 
the object groups are built. Note should be taken that static objects are not 
. adjusted. At step 128 the parameters of the existing dynamic objects are 
adjusted in order to perform tracking of the objects detected in the previously 
25 captured video frames. The objects are divided into groups according to then 
locations. Objects of a group are close to each other and may occlude each 
other. Objects belonging to different groups are distant from each other. The 
adjustment of the object parameters is performed for every group of objects 
separately. The adjustment to groups of objects enables appropriate handhng of 
30 occlusion situations. At step 126 groups of objects are built Each object u 
represented by a bounding marker, which a distinct artificially generated 
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10 



15 



20 



25 



graphical " structure, such as an ellipse. A pair of objects is identified, as two 
neighboring members if me minimum distance between their marker ellipses is 
up to a pre-defined number of pixels.' In the preferred embodiment of the 
invention the pre-defined number of pixels is 4 while in other embodiments 
different values could be used. Using the neighborhood relations between all 
the pairs of objects the object groups are built At step 128 the object groups 
are adjusted. The object group adjustment process determines die optimal 
spatial parameters of each object in the objects group. Each set of spatial 
parameter values of all the objects in a given objects group is scored. The 
purpose of the adjustment process is to find the spatial parameters of each 
object in a group, such that the total score of the group is maximized. The 
initial parameters are the values generated for the previously captured frame. 
The initial base score is derived from a predictive Kalman filter. In each 
adjustment iteration, a pre-defined number of geometric operations are 
performed on the objects. The operations effect changes in the parameters of 
every object in the group. Various geometric operations could be used, such as 
translation, scaling (zooming), rotation, and the like. In the preferred 
embodiment of the invention, the number of geometric operations applied to 
fce object is 10 while in other preferred embodiments different values could be 
applied. In the preferred embodiment of the invention, the following 
geometrical operations with the respective values are used: a) Translation right 
on axis 1, b) Translation left on axis 1, o) Translation right on axis 2, d) 
Translation left of axis 2, e) Down-scaling by shrinking axis 1, f) Up-scaling by 
blowing axis 1, g) Down-scaling by shrinking axis 2, h) Up-scaling by blowing 
axis 2, i) Rotation to the left through 5 degrees, and j) Rotation to the right 
through 5 degrees. The score of every change is measured and saved in a table. 
The structure and the constituent elements of the table are described via a 
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. In the example above mere are 3 objects in the group where each 
' row represents an object. The 10 adjustments performed on each object are 

represented by the results shown in each row. Performing adjustment 1 to the 
5 2* d object yields the maximum score for the group. Thus, adjustment 1 will be 

applied to the parameters of the 2 nd object, The score is weighted by the non- 
' movement-time of the object. As a result the algorithm tends not to perform 

changes on objects that were not in movement for a significant, period. The 

iterative process is performed in order to improve the score of the group as 
10 ■ much as possible. The iterative process stops if at last one of the following 

conditions is satisfied: a) the highest score found in the iteration is no greater 
' than the score at the beginning of the iteration, and b) at least twenty iterations 

have been completed. 

In order to reduce the computational load, every ellipse parameter is 
15 changed according to the movement thereof as derived by a Kalman filter used 
to track after the object. If the score of the group is higher than the base score 
the change is applied and the new score will become the base score. 

In order to handle occlusions a "united object" is built, which is a 
union of all the objects in the group. Thus, each pixel that is associated wim 
20 more than one object in the group, will contribute its score only once and not 
for every member object that wraps it. The contribution of each pixel in the 
SAD map to the total score of the group is set in accordance with the value of 
the pixel. 
(9): 

+ 2 HighTH <val 
+ 1 LowTH < val < HighTH 
— 1 val < LowTH 

Subsequent to the completion of the about 10 iterations, specific 

object parameters associated with each group object are tested against specific 
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thresholds in order to check whether the object should.be discarded. The object 
parameters to be tested are the minimum object area, the minimum middle 
score, the maximum dead area, and the overlap ratio. 

a) Maximum object area concerns a threshold value limiting the 
5 minimum permissible spatial extent for an object. If me maximum object area 

is smaller than the value of a pre-defined threshold then the object is discarded. 
So for example random appearance of non-real objects or dirty lenses providing- 
' random dark pixels are cleaned effectively. 

b) Minimum middle score relates to the score of a circle that is 
10 bounded in the. ellipse representing the object. If the score of the circle is below 

a pre-defined value of an associated threshold men the object is eliminated. A 
low-score circle indicates a situation where two objects were in close proximity 
in the scene and thus represented on object (one ellipse) and then - they 
separated. Thus, the middle of the ellipse will have a lower score than the rest 
15 of the area of the object. ^ 

c) Maximum dead area concerns an object that includes a large 
number of low value pixels. If the number of such pixels is higher than an 
' associated threshold value then the obj ect is discarded. 

d) Overlap ratio regards occlusion situations. Occlusion is supported 
20 up to about 3 levels. If most of the object is occluded by about 3 other objects 

for a period of about 10 seconds, the object is a candidate to be discarded. If. 
there is more than one object in that group that should be elirninated men the 
most recently moving object is discarded. 

Subsequent to the completion of the parameters testing procedure 
25 the non-discarded objects are cleared from the SAD map by setting the value of 
the set of pixels bounded in the object ellipse to zero. The discarded objects are 
saved in the discarded objects archive to be utilized as object history. The data 
of every new object will be compared against the data of the recently discarded 
objects stored in the archive in order to provide the option of restoring the 
30 object from the archive. 
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- Referring now to Fig. 10C consequent to the adjustment of tije 
existing objects, me pixel, in the SAD map » provided wi* values m the. 
rang e of 0 through 100. A value of zero means that me pixel belongs to „ 
Jting object The drawing shows me steps in the creation of new object, 
5 The coltiuction of a new object is based a pixel having a hrgh vame mjhe 
SAD map. The procedure stints by searching for a free enuy in the objects mble 
74 of Fig. 5 in order to enab!e. the storage of me parameters of a new object 
(not shown). The high value pixel is assumed to be the center of *e object * 
order ,o derive the boundary of the new object a specific boundary loca^ 
10 function, referred to herein after as me "spider function" - abated at s*p 
13 0 The spider function includes a set of program instructions assocxafcdwtih 
a conh-ol data stiucture. The contio. data sttucture contains location ami « 
aata that define the spatial parameters of a spider-like grapmcal structure. Th 
spider-like structure is provided with about 16 extensible members (arms) 
15 uniformly divided across 360 degree, The extensible members of me sprder- 
to structure are connected to the perceived center of the new object and 
Really radiate outward. Tbe length of each ex.ens.ble member 
sILsively. increased until the far end of spatially each member ,s ahgned 
ITIapJhavurgamghvaluemmeSADmap.luordermhandlesmaUgaps 

20 in me object "bridging- line segments of up to 4 pixels are allowed. Thus, 
.hern are more man 4 continuous low value pixel, in the dnecuon of me 
radiation, me extension of a member will be discontinue* The member- 
specific final coordinates are saved in X, V arrays m tire control dati .structure 
• respectively, in. order to indicate the suitable boundary points consenting tire 
boundary line.of tire new object Next in order to improve accuracy the cential 
point of me spider structure is recalculated from me X, Y arrays, as fotiows: 



25 

(10) 



30 



• Then, subsequent to the re-location the cenhal point of the structure at 
the Yc. Xc pixel coordinates the spider struck is re-built. Extending *e 
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about 16 extensible members of the spider structure yields two Y[16] and 
X[16] arrays. If the spatial extent of the spider structure is sufficient the 
parameters of the boundary ellipse are calculated. If the spatial extent of me 
spider overlaps the area of an existing object the new object will not be created 
unless its size is above a minimum threshold. 

Still referring to Fig! 10C at step 132 the spider-like graphical structure 
is converted to an ellipse-shaped graphical structure. An ellipse is provided 
with 5 parameters calculated from the X, Y arrays as follows: 



10 (11): 



k=0 ' 



The covariance matrix of the ellipse is: 



15 (12): 
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C = 



yy 



The ellipse covariance matrix is scaled to wrap the geometric 
average distance. The covariance matrix is multiplied by where F is calculated 
in the following manner 
(13): 

'/[*]_ 



d k = [X[k] Y[k]]-C~ l 



k = 0..15 



At step 134 the new object is adjusted via me utilization of the same 
adjustment procedure used for adjusting existing objects. The discarded objects 
archive includes recently discarded objects. If the spatial parameters, such as 
location and size, of a recently discarded object are similar to the parameters of 
the new object, the discarded object is retrieved from the archive and the 
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• tracking thereof is re-initiated. If no similar object is found in the archive then . 
the new object will get a new object ID, and the new object's data and meta 
data will be inserted into the objects table. . Subsequently tracking of the new 

object will be initiated. 

5 Referring now to Fig. 11 the oulput of the clustering step is the 

updated spatial parameters of the object stored in the object table. The scene 
characterization layer 208 uses the existing objects to describe the scene. The 
layer- 208 includes program sections that analyze the changes in the spatial 
parameters of the object, characterize the spatio-temporal behavior pattern of 

10 the object, and update the properties of the object. The temporal parameters " 
and the properties of the object are suitably stored in the objects table. At step 
210 object movement is measured. The measurement of the object is performed 
as follows: 
(14): 

dV = s^{MeanX — Yx evMeanX~) dV y = sga(MeanY -Pr evMeanT) 

AccMoveX = 0.5 • AccMoveX + 0.5 ■ dV x AccMoveY = 0.5 • AccMoveY + 0.5 • dV } 
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y 



AccDist = V AccMoveX 2 + AccMoveY 2 



MeanX/Y is the location of the center of the object in the current 
frame. PrevMeanX/Y is the location of the center of the object in the previous 
frame. The value of non-moving counter is updated in accordance with AccDist 
20 as follows: 
(15): 

[0.95 • NonMoveCnt AccDist > 0.8 

.■ NonMoVeCnt = \NonMoveCnf+l othei^ise 

In the unattended luggage application there is a possibility that a 
25 standing or sitting person that does not make significant movements will 
generate an alarm. In Order to handle such false alarms, the algorithm checks 
whether there is motion inside the object ellipse. If in at least 12 of the last 1 6 
frames there was motion in the object, it is considered as a moving object. 
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Consequently, the value of the non-moving counter is divided by 2. At Step 212 
' an object merging. mechanism is activated. There are cases in which an element 
in the monitored scene, such as a person or a car, is. represented by 2 objects 
whose ellipses are partially overlapping due to clustering errors. .The object 
5 merging mechanism is provided for the handling of me situation. Thus, for 
example, if at least 2 objects are close enough to each other, ("close" as defined 
in for the clustering process) and are moving with the same velocity for more 
than 8 frames then the two objects are considered as representing the same 
element Thus, -the objects will be merged into a single object and a new ellipse 
10 will be created to bound the merged objects. The new ellipse data is saved as 
the spatial parameters of the older object while the younger object is discarded 
Each merge is performed between 2 objects at a time. If there are more than 2 
overlapping objects that move together additional merges will be performed. 
Following the characterization of each object's spatio-temporal behavior 
15 pattern and other properties, such as texture (including but not limited to color), 
shape, velocity, trajectory, and the like, against the pre-defined behavior 
patterns and properties' of "suspicious" objects, at step 214 the objects whose 
behavior partem and properties are similar to the "suspicious" behavior and 
properties will generate an alarm trigger. Note should be taken that the 
20 suspicious behavior patterns and suspicious properties could vary among 

diverse applications. 

Referring now to Fig. 12 the background update layer updates the 
• reference images for the next frame calculation. The method uses two reference 
images: a) the long-term reference image, and b) the short-term reference 
25 image. The long-term reference image describes the monitored scene as a 
background image without any objects. The short-term reference image 
includes both the background image and static objects. Static objects are 
defined as objects that do not belong to the background, and are non-moving in 
the monitored scene for a pre-defined period. In the preferred embodiment of 
30 the invention the pre-defined period is defined as having a length of about 1 to 
2 minutes. In other embodiments, different time unit values could be used. The 
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*' background updating process uses the outputs of all fhe -previous layers to ■ 
■ generate a new short-term reference image. Each pixel that satisfies the 
. following conditions is updated: a) similar enough to the short-term reference 
image (according to the score given in the pre-processing step), and b) not 
5 included in an object. Pixels that do not satisfy the first condition but satisfy the 
second condition for a long sequence of frames get updated as well. For every 
fixed number of frames, a comparison is made between the current reference 
images to the previous reference images, in order to check if the changes made 
to the reference images were correct. The long-term reference image is updated 
10 from the short-term reference image in all pixels that are not contained in any 
of the tracked objects. An object may change its status from dynamic to static if 
it is not moving for a given period. It can change its status from static to 
dynamic if the score thereof in the long-term reference image significantly 
decreases. The background maintenance could be augmented by user-initiated 
15 updates. Thus, the user can add several objects to the background in order to 
help the system overcome changes in the background due to changes in the 
location of a background object. For example a "bench" object that was 
dragged into the scene will be identified by the method as an object The user 
can classify the object as a neutral object and therefore can add the object to the 
20 background in order to prevent the identification thereof as a dynamic or a 
static object 

Still referring to Fig. 12 at step 198 the background draft frame is 
updated. The background draft frame is continuously updated from the current 
frame in all macro-blocks (16 X 16 pixels or the like) in which there was mo 
25 motion for several frames. Each pixel in the background draft is updated by 
utilizing the following calculation: 
(16): • 

Background Draft (x, y) = Background Draft (x, y). + sgn (Current 
Frame (x 5 y) - Background Draft (x, y) 
30 When an object is identified as a static object, it is assumed that the 

identified object already appears in the background draft Thus, the pixels of 
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the object are copied from the background draft to- the short-term reference 
image. The short-term reference image is updated at step 200. The update of. 
each pixel in short-term reference image is performed in accordance with the 
values of the pixel in the SAD map and in the objects map! In the update 
5 calculations the following variables are used: 

SAD (x, y) = the SAD map value in the x, y pixel location 

OBJECT(x, y ) = the number of objects that the pixel in the x, y location 

belongs to 

BACKGROUND_COUNTER (x, y) 

10 NOT_BACKGROUND_COUNTER(x,y) 

The previously defined counters are updated by performing the following 

sequence of instructions: 

If (SAD (x, y) < 50) and (OBJECT (x, y),= 0) then the according to the 
SAD map the'pixel belongs to the background and does not belong to any 
15 object Therefore, the value of the BACKGROUND_COUNTER (x, y) is 
incremented by one. If SAD (x, y) >50 and (OBJECT (x, y) = 0) then the pixel 
does not belong to the background and does not belong to any object. 
Therefore, the value of the NOT_BACKGROUTS!D_COXJNTER is incremented 
by one. If OBJECT (x, y) not equal to 0 then there is at least one object that the 
20 pixel belongs to. Thus both counters are set to zero. Consequent to the updating 
of the counters the pixels are updated in accordance with the counters. If 
BACKGROUND_COUKTER (x, y) greater than or equal to 15 then the pixel 
at the x, y coordinates is updated and the counter is set to zero. If 
NOT_BACKGROUND_COUNTER (x, y) greater than or equal to 1000 then 
25 the pixel at the x, y coordinates is updated and counter se to zero. 

At step 202 the long-term reference image is updated by copying all the 
pixels that are not bounded by any object's ellipse from the short-term 
reference image to the long-term reference image. 

In the short-term reference image the score of each static object is 
30 measured. The score are compared to the score obtained when the object 
became static. If the current score is significantly lower than the previous score 
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it is assumed that the static ohject has started moving. The status of the object 
is set to "dynamic" and the pixels of the object are copied from me long-term 
reference image to the short-term reference image. Thus, the object will be 
adjusted for me next frame during the adjustment process. 

The applications that could utilize the system and method of object 
tracking will now be readily apparent to person skilled in the art, Such can 
include crowd control, people . counting, an offline and online investigate 
tools based on. the events stored in the database, assisting in locating lost 
luggage (lost prevention) and restricting access of persons or vehicles to certain 
zones, unattended luggage detection, "suspicious" behavior of persons or other 
objects and the like. The applications -are both for city centers, airports, secure 
locations, hospitals, warehouses/border and other restricted areas or. locations 
. and the like. 

It will be appreciated by persons skilled in the art that the present 
invention is not limited to what has been particularly shown and described 
hereinabove. Rather the scope of the present invention is defined only by the 
claims, which follow. 
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CLAIMS 

WHAT IS CLAIMED IS: 

1. An apparatus for the analysis of a sequence of captured images 
covering a scene for detecting and tracking of moving and static 
5 objects and for matching the patterns of object behavior in the 

captured images to object behavior in predetermined scenarios, the 
apparatus comprising the elements of: 

at least one image sequence source for transmitting a sequence of 
images to an object tracking program; and 
10 an object tracking program comprising; 

a pre-processing application layer for constructing a difference 
. image between a currently captured video frame and a previously at 
least one constructed reference image showing the background; 
an objects clustering application layer for generating at least one 
15 ne w or updated object from the difference image; and 

a background updating application layer for updating at least one 
reference image prior to processing of a new frame. 

2. The apparatus of claim 1 wherein the object tracking program further 
comprises a configuration application layer for initializing the 

20 ' apparatus in accordance with user pre-defined parameters.. 

3. The apparatus as claimed in claim 2 wherein the configuration 
application layer comprises a reference image constructor, the 
reference image constructor comprising a current frame capture 
module for assigning a captured image as the reference image. 

25 4. The apparatus as claimed in claim 2 wherein the configuration 

application layer comprises a reference image constructor, the 
reference image constructor comprising a reference image loading 
module for loading an existing reference image located on file as the 
reference image. 

30 5. The apparatus as claimed in claim 2 wherein the configuration 

application layer comprises a reference image constructor, the 
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reference image constructor comprising, a reference image learning 
module for generating a reference image from a consecutive sequence 
of captured images. 
6.. The apparatus as claimed in claim 2 wherein me configuration 
application layer comprises a timing parameters definer for providing 
time setting information. - 

7. The apparatus as claimed in claim 2 wherein the configuration 
application layer comprises the element of a visual parameters definer, 
the visual parameters definer for providing the geometry of the scene. 

8. The apparatus as claimed in claim 7 wherein the visual parameters 
definer comprises a camera tilt setting module for deriving camera tilt 
in accordance with measurements of an object located at different 

locations in the scene. 

9. The apparatus as claimed in claim 7 wherein the visual parameters 
15 definer comprises a camera zoom setting module for defining the 

maximum, the minimum and the typical size of the objects to be 
tracked. 

10. The apparatus as claimed in claim 7 wherein the visual parameters 
definer comprises a region location definition module for defining the 

20 location of at least one region-of-interest within the scene. 

11. The apparatus as claimed in claim 7 wherein the visual parameters 
definer comprises a region type definition module for defining a 
region of interest in the scene. 

12. The apparatus as claimed in claim 7 wherein the visual parameters 
definer comprises an alarm type definition module for defining a 
region of interest as a trigger alarm region. 

13. The apparatus as claimed in claim 1 wherein the pre-processing 
application layer comprises: 

a current frame handler for obtaining a captured frame; 
a short term reference image handler for loading an existing short- 
term reference image; 



25 



30 



35 



PCT/DL03/00097 

03/067884 . . 

a long term reference image handler' loads an existing long-term 
reference image; 

a pre-processor module for generating a new short term and long 
term reference images; 

a short term difference image handler for updating 1he short term 
reference image with the new short term reference image; and 

a long term reference image handler for updating the long term 
reference image with the new long term reference image. 

14. The apparatus of claim 13 wherein the short and long term reference 
image handlers further provide the moments of the short and long term 
reference images. 

15. The apparatus as claimed in claim 1 wherein the clustering application 
layer comprises: 

an object merger module for correcting clustering errors by 
successive merging of at least two partially overlapping objects having 
the same motion vector for a pre-defined period of time; 

an objects group builder module for creating at least one group of at 
least two close objects; 

an object group adjuster module for determining the spatial 
parameters of each object in the at least one group; and 

a new objects constructor module for constructing a new object 
based on me difference image. 

16. The apparatus as claimed in claim 15 wherein the clustering 
application layer further comprises an object searcher module for 
locating discarded objects having spatial parameters similar to the 
parameters of the new object. 

17. The apparatus as claimed in claim 15 wherein the clustering 
application layer further comprises a Kalman filtering module; 

18. The apparatus as claimed in claim 15 wherein the clustering 
application layer further comprises an object status updater module for 
modifying the status of an object. 
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- 19. The apparatus as' claimed in claim 1 wherein fee objects clustering 
application layer generates at least one new or updated object from the 
difference image and an at least one existing object. 
20. The apparatus of claim 1 wherein the object tracking program further 
5 comprises a scene characterization application layer for describing the 

scene and for triggering an alarm, based on comparing a behavior 
pattern of the at least one existing object to the at least one pre-defined 
behavior pattern or characteristic. 
2L The apparatus as claimed in claim 20 wherein the " scene 
10 characterization application layer comprises an object movement 

measurement module for analyzing changes in the parameters of the at 
least one existing object and deterniining the at least one existing 
object movement. 

22. The apparatus as claimed in claim 20 wherein the scene 
15 characterization application layer comprises an object merger module 

for correcting errors the at least one existing object and an alarm 
triggering mechanism for determining whether an alarm is to be 
triggered based on the at least one existing object patterns. 

23. The apparatus claimed in claim 1 wherein the background update 
20 application layer comprises a background draft updater module for 

updating the at least one reference image from the currently captured 
video frame. 

24. The apparatus claimed in claim 23 wherein the background update 
application layer further comprises a short term reference image 

25 updater module and a long term reference image updater module for 

maintaining the updated short term and long -term reference images. 

25. The apparatus claimed in claim 1 further comprising an object 
tracking control database, the database comprising; 

at least one long term reference image, the at least one long term 
30 reference image comprising a background image of the scene without 

dynamic or static objects tracked by the apparatus; 
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a short term reference image, me at least one short term reference 
image comprising a background image of the scene with the dynamic 
or static objects tracked by the apparatus. 
26. The apparatus claimed in claim 25 wherein the object tracking control 
5 database further comprising; ' 

an objects table comprising a list of the dynamic or static objects 
tracked by the apparatus, each object is associated with object data and 
object meta data; and 
a distance short term map and a distance long term map showing ihe 
10 short-term and long-term reference images; and 

a background draft comprising a changing image of the scene- and 
making up the reference image. 

27. The apparatus claimed in claim 26 wherein the object tracking control 
database further comprising a discarded objects archive for storing 

15 discarded objects. 

28. A method for the analysis of a sequence of captured images showing 
a scene for detecting and tracking of at . least one moving or static 
object and for matching the patterns of the at least one object behavior 

20 hi the captured images to object behavior in predetermined scenarios, 

the method comprising the step of: 
capturing at least one image of the scene; 

pre-processing the captured at least one image and generating a short 
term difference image and a long term difference image; 
25 clustering the at least one moving or static object in the short term 

. difference and long term difference images and generating at least one 
new object and at least one existing object. 
29. The method as claimed in claim 28 further comprising the steps of 
characterizing the visual scene and updating the background reference 
30 image by updating the short term reference frame and . the long term 

reference frame. 
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30. The method as claimed in claim 28 further comprising the step of 
configuring the object tracking program for providing at least one 
reference image, at least one timing parameter and at least one visual 
parameter. 

5 31. The method as claimed in claim 28 further comprising the step of 

configuring the object tracking program for setting at least one region 
of interest 

32. The method as claimed in claim 28 further comprising the step of 
configuring the object tracking program, said step comprises the steps 

10 • of: 

constructing an initial short term reference image and an initial long 
term reference image; 

providing the object tracking program with the initial short term 
reference image and the initial long term reference image; 
15 providing timing parameters; and assigning visual parameters. 

33. The method as claimed in claim 32 wherein the step of constructing 
- comprises creating the short term reference image and the long term 

reference image from a captured image. 

34. The method as claimed in claim 32 wherein the step of constructing 
20 comprises creating the short term reference image and the long term 

reference image from internally stored images. 

35. The method as claimed in claim 32 wherein the step of constructing 
comprises creating the short term reference image and the long term 
reference image through a learning process utilizing a set of 

25 • sequentially ordered and captured images. 

36. The method as claimed in claim 28 wherein the step pf pre-processing 

comprises the steps of: 
obtaining the short term reference image; 

. obtaining the long term reference image; • 

30 obtaining a currently captured image; 
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generating a short term difference image from the short term 
reference frame and the currently captured image; 

generating a long term difference image from the long term 
reference frame and the currently captured image. 
5 37. The method as claimed in claim 28 wherein the step of clustering 

comprises the steps of: 

building groups of clustered objects from at least two dynamic or 
static objects in accordance with the relative locations of each of the at 
least two dynamic or static objects; 
10 . adjusting the parameters of each of the at least two dynamic or static 

objects clustered within each group; 

updating the parameters and status of each of the at least two 
dynamic or static objects. 
38. The method of claim 28 wherein the step of clustering comprises the 
15 • steps of predicting the motion of the at least one moving object by 

predictive filtering and adapting the parameters of the at least one 
moving object 

' 39. The method as claimed in claim 37 wherein the step of building 
groups of clustered objects comprises the steps of: 
20 measuring the distance between each of the at least two dynamic or 

static objects; 

determining neighborhood relations between each of the at least two 
dynamic or static objects and in accordance with the results of the 
distance measurement; 
25 clustering the at least two dynamic or static objects in accordance 

with me determined neighborhood relations into distinct object 
groups; and 

adjusting the distinct object groups in order to determine the optimal 
spatial parameters of each of the at least two dynamic or static objects 
30 in the distinct object groups. 
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40. The method as claimed in claim 38 wherein- the step of adapting the 
parameters of the at least one moving object comprises. the steps of 

. locating the center of the at least one moving object; locating the 
- boundary points constituting the boundary line of the at least one 
5 " ■ moving object; re-calculating the location of the center of the at least 
one moving object; and inserting the at least one moving object into an 
objects table. 

41. The method as claimed in claim 40 further comprising the steps of 
adjusting the spatial parameters of the at least one moving object and 

10 retrieving similar objects to the at least one moving object from a 

discarded object archive. 

42. The method as claimed in claim 29 wherein the step of characterizing 
comprises the steps of: measuring the movement of the at least one 
moving object to determine the behavior of the at least one moving 

15 " object; . merging spatially overlapping objects; generate, an alarm 
trigger in accordance with the results of the behavior of the at least 
one moving object or in accordance" with' the spatial or visual 
parameters of the at least one moving object. 

43. The method as claimed in claim 42 wherein the alarm trigger is 
20 generated in accordance with the texture of the object. 

44. The method as claimed in claim 42 wherein the alarm trigger is 
generated in accordance with the shape of the object, 

45. The method as claimed in claim 42 wherein the alarm trigger is 
generated in accordance with the velocity of the at least one moving 

25 object. 

" 46. The method as claimed in claim 42 wherein the alarm trigger is 
generated in accordance with the trajectory of the at least one moving 
object. 

47. The method as claimed in claim. 28- wherein the step of updating the 
30 background comprises the steps of: updating the background draft; 
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updating the short term reference image; and updating the long term 
reference image. . 
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